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ABSTRACT 



The purpose of the Neyman-Johnson statistical 
technique is to determine a region or span of values on r independent 
variables where the predicted criterion scores of two or more 
treatment groups are significantly different. Consequently, the 
technique should prove especially useful in research concerned with 
moderator variables or with the interactions between treatments and 
person variables. The mechanics of the technique are reviewed and 
some extensions mentioned. Three simple examples are given. (Author) 
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PREFACE 



The Neyman-Johnson technique is an old method with new 
importance, as Professor Aiken's title suggests. In this 
paper the method is explained, with applications to show its 
usefulness for educational research in general and Center- 
related activities in particular. 

Professor Aiken is visiting the Stanford Center during 
this academic year as a USOE Postdoctoral Fellow in Educe tional 
Research. During the 1969-1970 academic year he will join the 
faculty of Guilford College, Greensboro, N. C. , as Professor of 
Psychology and Chairman of the Psychology Department. 

Richard E. Snow 

Coordinator, Program on Heuristic Teaching 




o 

O 



Abstract 

The purpose of the Neyman-Johnson statistical technique is 
to determine a region or span of values on r Independent 
variables where the predicted criterion soores of two or 
more treatment groups aro significantly different. Conse- 
quently, the technique should prove especially useful in 
research concerned with moderator variables or with tne 
interactions between treatments and person variables. The 
mechanics of the technique are reviewed and some extensions > 

mentioned. Three simple examples are given. 
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Interactions Among Group Begresslonst 
An Old Method in a New Setting 
Lewis B. Aiken, Jr. 

Although multivariate statistical methods appropriate 
for the analysis of educational and psychological data are 
now readily available, some of the potentially most useful 
methods are unfamiliar to many researchers in education and 
psychology. One suoh example is the Neyman- Johns on tech- 
nique for testing differences among group regressions, a 
statistical procedure introduced over 30 years ago (Johnson 
A Neyman, 1936) fluid extended somewhat during the ensuing 
years (Abelson, 1953; Potthoff, 1964) but still not commonly 
known. 

To be sure, there are exeunples in the older literature 
of studies which have employed the Neyman-Johnson technique 
(Hauisen, 1944; D. A. Johnson, 1949; H. C. Johnson, 1944; 
Johnson <fc Pay, 1950; Johnson A Hoyt, 1947), but these papers, 
written by a few sophisticates, are either insufficiently 
dear on how the technique was employed or perhaps too replete 
with complicated symbolism for the majority of readers who 
might find the technique useful. 

In its most general formulation, the Neyman-Johnson 
technique is a procedure for deteimlnlng into whioh of two 
or more categories (e.g., treatment conditions) an individual 
with a oertaln set of soores on r Independent (control) vari- 
ables should be placed in order to maximize his score on a 
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crlterlon variable. The problem lo a contemporary one, con- 
sidering the ourrent Interest in moderator variables and 
aptitude-treatment interactions. One purpose of the present 
paper is to indicate that in these types of investigations 
there are alternatives to tests for parallelism (oomcon slope) 
of a set of regression lines. 

Ponnulat lon_ and Extensions 

In the original formulation of the problem (Johnson Sc 
Neyman, 1936), two treatment groups (1 and 2), two Independent 
variables (x^ and x 2 ) end one dependent variable (y) were 
specified. The task is to find the set(s) of points (x^, x 2 ), 
in the spaoe having the two Independent variables as axes, in 
which the y value predicted from the regression equation for 
group 2 is significantly larger or smaller than the y value 
predicted from the regression equation for group 1. Suoh 
sets of points or regions are specified by a quadratic equation 
which plots as a conic section (ellipsoid). Abelson (1953)* 
in a rather lucid presentation of the meohanics of the technique, 
generalized it to three or more independent variables. A 
further extension by Potthoff (1964), whloh is somewhat more 
difficult to follow beoause of an error in his formula 2.4, 
considered the procedure when the number of groups is greater 
than two and the number of criterion variables greater than 
one. Potthoff also argued for slightly different procedures 
depending on the research question. Thus, different computa- 
tions are involved when one wishes to determine whether two 
treatment conditions are different for a certain point (x^. 
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* 2 » • • •• x p ) in the region of eignifioanoe demarcated by the 
Neyman-Johnson technique, in oontraet to determining whether 
the two treatments are different simultaneously for all points 
in the region. In addition, Potthoff recommended the construc- 
tion of confidence limits for the difference between the regres- 
sion equations of the two groups as a feasible alternative to 
the plotting of regions of significance, especially when the 
number of independent variables is greater than two. Finally, 
Potthoff cautioned that the Neyman- Johns on technique may result 
in significance regions that are too small or outside the range 
of actual values on the independent variables, or confidence 
limits that are too broad to be of use. This is particularly 
likely when the numbers of cases in the treatment groups are 
small and/or the residual variances in the regression analyses 
of the scores of the two groups are large. 

Preliminary Teats 

Abelson (1953) suggested a list of steps or assumptions 
that may serve as a guide to when the Neym&n-Johnson technique 
should be used. Given two groups (1 and 2) i 

1. Determine whether the residual variances (the variances 
of the observed y's about the regression surface) are 
significantly different for the two groups. 

2. If the residual variances are not significantly differ- 
ent, test for parallelism of the regressions of the two 

groups, i.e. ^ w* ^21* * * ** ^rl^ * ^^*21* ^ 22 * * * * ^r2^ * 

3. If the regressions are not parallel, test for equality 

of intercepts of the two groups, l.e.,^^ '/ 0 2- If 
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the regressions are significantly non-parallel, use the 
Neyman- Johns on technique. 



Statistics of the Neyman-Johnson Technique 
In order to make the statistical procedure more compre- 
hensible to a wider audience and more consistent with general 
statistical notation, Abelson's (1953) and Potti.off's (1964) 
notations have been modified to some extent In the present 
paper. The majority of the required values can be obtained 
from a conventional multiple regression program and the addi- 
tional procedures carried out quite easily on a computer or 
a desk caloulator. 

Let subscript 1 stand for the 1th Independent variable 
(1=1, 2, . . ., r), subscript J the Jth group (J ■ 1, 2), 
and subscript k the kth person (k * 1, 2, . . . , nj. Then 

J 

x ljk th ® raw sooro k In group J on Independent 

variable 1, and y^ is that person's score on the dependent 
(criterion) variable. The n^ by (r + 1) matrix of scores 
and the veotor of n. scores y.,. arei 
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The vector of Intercepts and partial regression coefficients 
for group J Is computed as bj = ^ X J X J^~* X J^J' d-®*** 11 ® 

(bj - bp * (b Q2 - b Q1 , b 12 - b 11# ...» b r2 - b rl ). 

The o omblned residual sum of squares for the two groups 

Is 8S e = ^ ^J^J “* -J X J^J^' 8111(1 d ® fa ‘ nlns th ® pooled residual 
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degrees of freedom as f « Z_ (n, — r — 1), the mean square 

J*1 0 

for error is ms * bb/T. In order to estimate the variance 
of the difference between the regression equations for the two 
groups, compute V « ms e [(X^X 1 )" 1 + (X'X 2 ) -1 ]. Finally, let 
vector ss (xq, x x 2 » . . ., x r ) be a list of hypothetical 
raw scores on the Independent varlobles, where Xq ■ 1. 

Finding Regions of Significance and Confidence Limits 

There are two possibilities to consider In setting up a 
quadratic equation for determlni*ig the x reglon(s) of signifi- 
cance or confidence limits for the differences between predicted 
y*s. To find a critical region such that, with confidence 
100(1 — c<), it can be stated that the two groups are different 
for any Individual point In the region, computet 

(1) 2T[(£ 2 “ ^XbJ “ 0. 

On the other hand, to find a critical region such that, with 
confidence 100(1 — ) , It can be stated that the two groups 

are different simultaneously for all points contained In the 
region, oompute: 

(2) x'[(£ 2 - b^ (b' - b') - (r + D F r+1 ,f ;o< v ]£ > °* 

Potthoff suggested that, since plotting regions by use 

of the above equations is so tedious, the Investigator may 
simply settle for constructing confidence limits for the 
expression C^ 2 — ^Jx as t 

(3) C(b| - *>£)*> * f!0(/2 <rvx)*. or 

W [(££ “ ^ 1 ) 2 ] ± */<r + l) F r+ i/f ;e< (2' v 2) , 
corresponding to the region formulas given In 1 and 2, res- 
pectively, For a given set of scores x, if formula 3 does 
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not Include 0 then It can be stated with 100(1. — ex ) per cent 
confidence that the predicted criterion scores corresponding 
to x are significantly different In the two groups. Formula 
4 allows the Investigator to make a similar statement simultan - 
eously for all points x in the critical region of formula 2. 
Plotting the Critical Reglon(s) 

The quadratic equations of formulas 1 and 2 above des- 
cribe conic sections (ellipsoids) and will give two signifi- 
cance regions— one where the predicted value of y In group 2 
Is larger than In group 1 and the ether where the predicted y 
value In group 1 Is larger than in group 2. The boundaries 
of these regions are not difficult to compute and plot when r, 
the number of Independent variables , Is less than three. This 
can be accomplished most efficiently when r = 2 by substituting 
successive equally spaced values of x 2 into the quadratic 
equation, setting the equation equal to 0, and applying the 

quadratic formula x^ = (-b ± Vb a —4ac)/2a to determine the 

1 

boundary values of x^ for the given value of x 2 . 

Elaborations on the Technique 
Although Abelson ( 1953 ) and Potthoff ( 1964 ) do not expli- 
citly mention the fact, the Neyman- Johns on technique can easily 
be extended downward to cover the case of one independent vari- 
able. In this case the computatior.3 are much simpler, since 

, 2 _ 

(£»> S' v £ s ms e {jE^ ( n j x3 - 2 *£ x jk + i x 3k^ n jS x Jk ~ ^ x jk^ 3 

and ^he other needed quantities may be computed from simple 

linear regression formulas. The significance region(s) in this 

1 0f course, when r = 1 , simply setting the one-variable 
quadratic equation equal to zero and finding the roots by the 
q quadratic foimula will do the trick. 

ERJC 
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case, however, will be demarcated by lines parallel to the y 
axis (see Figure* 1 and 2). 

As was lndloated above, Potthoff (1964) extended the 
Neyman- Johns on technique to g groups and p criterion variables. 
These extensions consist of pairwise comparisons of the g groups 
on the p criteria and require the plotting of pg(g — l)/2 regions, 
one for each pair. The extensions are straightforward and do 
not Involve computations greatly different from those detailed 
above (see Appendix). 

Examples 

Three examples of applying the Neyman-Johnson technique 
will be given— -two where r * 1 and one where r * 2. In a 
dissertation study at Stanford University by Mary Lou Koran, 
student teachers were exposed to one of two kinds of Informa- 
tion between microteaching sessions. The 40 students In group 
2 were exposed to a film portrayal of the particular teaching 
skill to be learned, and the 40 students in group 1 read a 
verbatim text of the sound track of the fiim. The skill to 
be learned was the formulation of analytic questions by the 
teacher during class discussion. The independent variables 
are the Hidden Figures test from the Kit of Selected Heference 
Aptitude and Achievement Factors (French, 1963) and a test 
called Film Memory. The regressions of the criterion (Total 
Number of Analytic Questions) scores on Hidden Figures scores 
are Illustrated in Figure 1, and the regressions of the criter- 
ion on Film Memory are Illustrated in Figure 2. The broken 
vertical lines In the figures demarcate regions of significance 

o 

ERLC 
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Pig. 1. Regions of significance 
In example 1 with one Independent 
variable, (See text for explanation.) 
Groups Not Slgnl- { Group 2 





Pig. 2. Regions of significance 
In example 2 with one Independent 
variable. (See text for explanation.) 
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(.95 level, formula 2) where the predicted orlterlon scores 
of treatment group 2 are significantly different from those 
of treatment group 1. 

In the area to the left of the broken vertical line In 
Figure 1 (very low scores on Hidden. Figures), the predicted 
criterion scores of group 2 are slgnlf leant ly higher than those 
of group 1. A similar area where group 1 Is superior to group 
2 In predicted criterion scores lies to the right of Figure 1, 
but since this area oontalns no actual data points It Is not 
shown In the Illustration. 

In the area to the right of the broken vertical line In 
Figure 2 (very high scores on Film Memory), the predicted 
criterion scores of group 2 are significantly higher than those 
of group 1. A similar area where group 1 Is superior to group 
2 In predicted criterion scores lies to the left of Figure 2, 
but slnoe this area contains no actual data points It Is not 
shown In the illustration. 

Since It Is obvious that the regression slopes are quite 
different for the two groups In Figures 1 and 2, It was decided 
to analyze the combined effects of Hidden Figures and Film 
Memory on Total Analytic Questions. Figure 3 illustrates the 
solution when both Independent variables are considered. The 

75# region of significance where group 1 is superior to group 

2 

2 Is off the graph to the right and contains no data points. 

Of course, the fact that the predicted criterion scores of one 

^Slnce the correlations between the Independent variables 
were essentially zero for both groups In this study, the signi- 
ficance regions In Figure 3 can be roughly predicted from the 
results depicted In Figures 1 and 2. 
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group are superior to those of another group In a given region 
does not necessarily Imply that a given treatment should be 
adopted for all examinees whose x scores fall In that region. 
Which treatment should be employed with a given Individual 
depends not only on the probability of his scores falling within 
a certain treatment region but also on such factors as cost 
and convenience of the treatment. 
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Fig. 3. Regions of significance 
In example 3 with two Independent 
variables. (See text for explanation. ) 
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Neyman- Johns on Technique for More Than Two Groups 

Let there be g groups, with (J, J) being any of the 
m * g(g — 1 )/2 pairs (J > J). In expressions for ss and ms , 

W %S 



dence Intervals for — ^.)x for all possible (J, J) but 



Appendix 




1=1 



Formulas for obtaining simultaneous confi- 
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for a single x are* 
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(6) - t')i ± VTr=TTP^7^ , V^i . or 

(7) (bj - b')x ± , whichever la smeller. 

To obtain simultaneous confidence Intervals for all x, use: 

(8) <bj - b^x ± VTr “ l)(e - l>P< P+ i)(g-i).f 1 *X'Vj J X . or 

(9) <^J “ Sj)i ± V(r + l> F (r+i).fic</ni£’ v Jj£ • "hlohoTer is 

smaller. Formulas for the m = g(g — l)/2 regions corresponding 
to the confidence Intervals described above are, for formula 
9 for example i 

(10) st(bj - V<Sj - *J> - tr + 1)P (r + l).f ! «/» v JJ^ — 0- 



Formulas for regions corresponding to the confidence limits of 
formulas 6, 7, and 8 may be written In similar fashion. 

More Than One Criterion Variable 

The problem Is to obtain simultaneous confidence limits 
for the differences (g^' - £j')x» where 1 Is the "1th" criterion 
variable (1 « 1, 2, . . p). An appropriate formula Is: 

(11) Js}')i± V(r+l)(g-i)F (r+1)(g . 1)tf „ /p 2-v Jj£ . or 



(12) <)£'- bj')* ± v'(r+l)F (r+1)>f ^ Bp 2'V J “i . whichever Is 

smaller; bj — b^ and ms£ are the same functions of the yj k as 
bj — bj and ms e are, respectively, of the y^s. A region 
approach Is equivalent to the above, but there will be p times 
as many regions as In the univariate case of one y variable. 



